Clustering is the most common method for organizing unlabeled data into its natural\ngroups (called clusters), based on similarity (in some sense or another) among data objects. The\nPartitioning Around Medoids (PAM) algorithm belongs to the partitioning-based methods of\nclustering widely used for objects categorization, image analysis, bioinformatics and data\ncompression, but due to its high time complexity, the PAM algorithm cannot be used with large\ndatasets or in any embedded or real-time application. In this work, we propose a simple and scalable\nparallel architecture for the PAM algorithm to reduce its running time. This architecture can easily\nbe implemented either on a multi-core processor system to deal with big data or on a reconfigurable\nhardware platform, such as FPGA and MPSoCs, which makes it suitable for real-time clustering\napplications. Our proposed model partitions data equally among multiple processing cores. Each\ncore executes the same sequence of tasks simultaneously on its respective data subset and shares\nintermediate results with other cores to produce results. Experiments show that the computational\ncomplexity of the PAM algorithm is reduced exponentially as we increase the number of cores\nworking in parallel. It is also observed that the speedup graph of our proposed model becomes more\nlinear with the increase in number of data points and as the clusters become more uniform. The\nresults also demonstrate that the proposed architecture produces the same results as the actual PAM\nalgorithm, but with reduced computational complexity.
Loading....